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ABSTRACT 

Vo  protCBt  a  simple  algorithm  for  maintaining 
a  replicated  distribeted  dictionary  which 
icklttti  high  availability  of  data,  rapid 
proceaaing  of  atonic  agtioaa,  efficient 
utilization  of  atorage.  and  toleraaoe  to  noda 
or  network  failnrea  including  loat  or 
dnplioated  aeeeagee.  It  doea  not  require 
transaction  loga.  synchronised  olocka.  or 
other  complicated  mechanisms  for  lta 
operation.  It  aehiewea  eonalateaey 
contrainta  which  are  considerably  weaker  than 
aerial  eonalateaey  bat  aoaetheleaa  are 
adeqoete  for  nany  dictionary  appllcationa 
each  aa  electronic  appointment  calendara  and 
■ail  ayataaa.  The  degree  of  oonalatoacy 
achieved  depanda  on  the  particular  hietory  of 
operation  of  the  eyetoa  in  a  way  that  la 
intaitiwe  and  anally  aaderatood.  The 
algorithm  implements  a  “beat  effort" 
approximation  to  fall  aorial  ooaaiateaey, 
relative  to  whatever  iaternoda  eoanaicatlon 
ha  a  naeceaafally  taken  place.  ao  the 
eeaaatiee  are  fally  apeclficd  even  nader 
partial  failure  of  the  ayataa.  Both  the 
eorreetaeee  of  the  algoritha  aad  the  utility 
of  each  weak  eeaaatiee  depend  heavily  on 
apecial  proportion  of  the  dictionary 
operations. 
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1.  Introduction 

A  common  anion  taken  for  the  corrcctnsas  of  a 
databaac  system  ia  that  the  transactions  be 
serialisable,  that  is,  the  results  of  any  sequence 
of  transactions  should  be  the  sane  aa  if  they  had 
been  perforned  in  cone  serial  order  [3.  5.  17.  19]. 
Berlaliaablllty  insures  consistency  of  the  database 
when  concurrent  transactions  are  being  processed 
attuning  only  that  saeh  transaction  ia  oorreot  when 
run  alone. 

Achieving  aerial  consistency  in  an  unreliable 
distributed  environment  la  considerably  more 
difficult  than  la  a  oentral  database,  and  mueb  work 
has  been  done  addressing  this  problem 

(2,  4,  9.  20].  (Cf.  [Id]  for  a  nice  survey  of 
sosm  of  the  issues.)  Reasons  for  distributing  data 
ia  the  first  place  are  to  iacreaee  speed  of  access 
and  to  insure  availability  of  data  even  when 
individual  nodes  or  the  network  itself  falls.  Both 
of  these  goals  require  replication  of  the  data, 
which  introduces  the  new  problem  of  keeping  the 
replicated  copies  up-to-date.  (Cf. 

(7.  8.  13,  IS.  211.) 

Unfortunately,  the  two  goals  of  availability 
and  aerial  consistency  stand  somewhat  la  conflict. 
For  example,  availability  dictates  that  every  node 
with  a  copy  of  the  database  be  permitted  to 
continue  performing  transactions  on  its  local  copy 
even  when  tha  network  falls.  Serialisablllty,  on 
the  othor  hand,  reqniret  that  at  most  one  ansa  node 
be  allowed  to  proceed  under  such  conditions,  for 
otherwise  the  copies  begin  diverging  and  reads  can 
return  values  Inconsistent  with  any  serial  ordering 
of  the  traasaetioas. 

Several  authors  have  uoted  that  awaaiagfnl 
results  can  often  be  obtained  even  without  serial 
consistency  when  additional  information  about  the 
particular  transactions  is  available  [10,  11,  111. 
Also,  strict  scrlalisability  is  often  not  required 
for  read-only  transactions  K,  IS].  Vc  procont  an 
example  of  a  problem  which  is  adequately  served  by 
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»  iitibtn  wtiifyiti  mob  weaker  condition*  »nd 
giro  ta  alforltba  for  it*  eolation.  Oox  algoritha 
achieves  high  availability  of  dot*.  rapid 
proc«**ing  of  atoaic  action*,  officiant  atiliaation 
of  atoraga,  and  tolarano*  to  aodo  or  network 
fallnroa  inelndiag  loat  or  dopiioatad  aaasagaa.  It 
dona  not  raqaira  transaction  logs,  synchronised 
clock* ,  or  otkar  eoaplioatad  ■aakaniaaa  for  it* 
operation. 

The  dagraa  of  consistency  achiarad  by  oar 
aolatlon  depends  on  the  particalar  history  of 
operation  of  the  aystea  in  a  way  that  is  intuitive 
and  easily  aaderstood.  The  a Igor it ha  lapleaenta  a 
"beat  effort"  approaiaation  to  fall  sarial 
consistency.  relative  to  whatever  internod* 
couanaicatioa  hat  aacsessfally  takas  plaae,  so  th* 
saaantisa  are  folly  spscifiad  even  andar  partial 
fsilare  of  the  aystea. 

Johnson  and  Thoaaa  [10]  give  an  algoritha  for 
s  siallar  probtea  which  aaea  tiaeataaps  to 
serialia*  apdataa  (cf.  [14])  bat  peraits  arbitrary 
reading.  Vhile  it  enjoy*  aany  of  th*  aaae 
advantage*  of  oar  algoritha,  it  requires  deleted 
dictionary  entries  to  be  retained  antil  all 
proeeasea  have  apdated  their  copies  of  th* 
database.  Also,  their  read  seautles  are  s  one  what 
weaker  than  oars. 

Algoritha*  sack  as  (7,  21]  which  as*  voting 
scheaes  are  able  to  provide  both  serial  aoasistenay 
and  data  availability  despite  Halted  nod* 
failarea,  bat  like  all  serialisable  algoritha*. 
apdataa  la  all  bat  on*  aabaet  are  disallowed  when 
th*  aatwork  become*  partitioned. 

2.  Distributed  Dictionaries 

Abstractly,  oar  problen  i*  to  aaintsin  a 
datsbaa*  consisting  of  a  dictionary,  that  la,  a  set 
of  elements  with  two  apdat*  operations  INSERT  and 
DELETE,  and  a  tingle  query  operation  LIST  lot. 

[1]).  INSIST  ( a )  add*  sleaient  a  to  th*  set, 
DELEIE(a)  restores  a  frees  th*  set  if  it  was  there 
and  does  nothing  otherwise,  and  LIST  ratara*  aa 
•aaaeratioa  of  th*  aleaeat*  currently  In  th*  set. 
All  three  operations  are  considered  to  be  atonic 
transaction*. 

Th*  database  1*  to  be  implemented  on  aa 
unreliable  network  of  proaeaaors.  Oar  goal  is  to 
sake  th*  database  highly  available,  even  aader 
condition*  in  which  iadividaal  node*  and  th* 
network  are  not  always  operational.  By 
"available",  we  man  that  any  operational  node 
•hoald  be  able  to  porfora  any  of  th*  basic  database 
operation*  at  any  tins,  regardlaa*  of  th*  atata*  of 


the  rest  of  th*  syatea. 

Bach  nod*  maintains  its  on  copy  or  view  of 
th*  database,  and  all  operations  are  performed 
initially  only  on  th*  node's  local  view.  From  tin* 
to  tin*  a  nod*  send*  informs t ion  about  its  view  to 
on*  or  nor*  other  node*.  A  sod*  reoeiving  sash 
information  then  apdataa  its  own  view.  V*  have  in 
effect  added  two  new  operations:  SEND(m)  and 
SBCErVE(a) ,  whar*  a  is  th*  message.  As  nor*  and 
nor*  massages  are  seat,  information  1*  that 
propagated  throaghoat  th*  network,  and  th* 
iadividaal  views  of  th*  data  tend  to  converge  to 
th*  view  that  woald  b*  "correct"  w*r*  this  all 
taking  plae*  la  a  centralised  database. 

Oar  notion  of  correctness  depends  not  only  on 
th*  particalar  apdat*  and  query  operations 
requited  by  th*  oasrs  of  th*  system  bat  also  on 
tbs  internal  oonmunicat ions  that  have  taken  place, 
about  which  we  make  no  aasaavtion*.  The  Intention 
ic  that  in  a  correctly  fuctioniag  system,  enough 
communication  will  take  plao*  so  that  every  nod*  of 
th*  system  will  know  about  an  insertion  or  deletion 
shortly  after  it  oconrs,  and  no  view  will  be  far 
oat  of  date.  However,  oar  correctness  condition  is 
simply  that  an  element  r  is  in  nod*  i's  view  iff  i 
know*  of  its  insertion  bat  does  not  know  of  it* 
deletion. 

V*  plao*  two  restriction*  on  th*  problem: 

U.  V*  assess  that  there  1*  at  most  on* 
oeearrenec  of  th*  operation  DiSEHT(z) 
for  each  element  r,  so  that  once  an 
clement  has  been  deleted  from  th*  set, 
it  can  never  again  be  reinserted. 

12.  DBLETB(a)  is  only  legal  at  a  nod*  j  if  r 
1*  currently  in  J's  view. 

T*  seed  both  restriction*  for  technical  reasons. 
Among  other  things,  they  insure  that  INSERT(x)  can 
asvsr  follow  DBLEIE(z),  so  if  a  nod*  discover*  that 
both  operations  were  performed  sometime  in  th* 
past,  than  a  definitely  does  not  belong  is  its 
view.  Also,  both  restrictions  arise  natnrally  in 
a* ay  applications.  One  way  to  enforce  restriction 
El  is  to  tag  th*  setasl  datum  with  a  "tine stamp” 
which  uiqaely  identifies  the  particular  insertion. 
Thu,  two  attempts  to  insert  th*  same  datum  will  in 
fact  give  rise  to  two  different  elements  s  and  z' 
with  different  tinestamp*.  Restriction  K2  is 
natural  in  applications  where  the  only  way  to 
specify  sa  argument  to  DELETE  is  to  "point"  at  th* 
clement  asMng  th*  ones  in  th*  current  view.  Hash 
is  generally  th*  case,  for  czaaple,  when  th* 
element*  are  tagged  with  timestamps.  Nots  that  w* 
do  permit  several  deletions  of  th*  asm*  element; 
their  effect  is  th*  same  as  a  single  on*. 
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This  abstract  problem  «u  motivated  bp  tb< 
practical  ptobln  of  building  a  highly  available 
electronic  appolataaat  aalaadar.  Bara  tba  data 
Itaaa  ara  individual  appointments,  aad  aa 
appolataaat  ealaadar  la  Jaat  a  aot  of  appolataosta. 
A  aaar  caa  road  aad  modify  bla  appolataaat »  froa 
a ay  and*.  Ha  *111  a* a  ovary  appolataoat  that  It  la 
poaalbla  to  too,  giooa  tha  latarproooaa 
ooaaioaleatloa  that  haa  actually  takoa  plaoa.  Ia  a 
fully  vorklag  ayatoa  ho  rould  aoo  all  bat  poaalbly 
vary  raoaatly  aatarad  appo lataaat a .  Aaytklag  ha 
eaa  tea  ha  eaa  aaalpulata  aa  If  ha  vara  vorklag  oa 
a  oaatralixad  ayatoa.  Finally,  any  ohamgaa  ha 
aakaa  *111  ba  raf looted  at  tha  othar  aodaa  vhaa  tha 
ayatoa  la  again  vorklag.  area  If  tha  aatvork 
happaoa  to  ba  aaavailabla  vhlla  ha  la  actually 
doing  tha  aodif icationa.  Note  that  baoaaaa  tha 
views  ara  not  alvaya  up-to-dat*.  eoafliotlag 
appolntaanta  any  not  be  discovered  laaadlataly. 
Banoa.  It  ia  aaeaaaary  for  tha  ealaadar  ayatoa  to 
ba  able  to  handle  conflict!  at  tlaaa  othar  thaa 
vhan  aa  appolataaat  la  firat  aatarad.  (This  ia 
probably  a  daairabla  property  anyvay.) 

Othar  plaoaa  vhere  thla  problea  arlaoa  ara  la 
dlatributed  aall  ayataaa  and  diatribated  file 
directory  ayataaa.  both  of  vhloh  abatraotly  juat 
aaiataln  diotloaariaa.  Ia  a  dlatributed  aall 
ayatoa.  our  aolutioa  eoald  ainplify  tha  uaual 
aatvork  aaller.  Tha  aatvork  aailar  would  only  hare 
to  dal Ivor  a  aaaaago  to  oaa  of  a  aaar' a  aailbozaa; 
tha  diatributloa  of  aall  to  tha  aaar* a  othar 
aailbozaa  would  thaa  ba  handled  by  our  algoritha. 
Indeed,  If  tha  racipiaat  had  a  local  aailboz,  thaa 
only  tha  local  aailar  vould  ba  naadod  and  tha 
aatvork  aailar  voald  aot  haw*  to  ba  invoked  at  all. 

3.  Formal  Problaa  Statement 

For  each  natural  number  N,  let  IN]  ■  (1,  2, 
....  N).  Let  D  ba  tha  doaaia  of  alaaeata.  Lot  OF 
-  ( 'INSEXT(z) ' ,  'DELETE(x) '  I  z  a  D)  0  { ‘LIST’ }  0 
('SEND (a) ' RECEIVE (■) *  I  a  ia  a  aaaaago).  Va 
formulate  our  oorractaaaa  ooaditloaa  la  toraa  of  a 
partial  order  of  avaata  vhloh  rapraaaata  tha 
hiatory  of  laforaatloa  flow  la  tha  ayatoa. 

Fiz  a  particular  azacutloa  of  the  ayetaa. 
Bach  laataaoa  of  an  oparatloa  t  *  OP  corraapoada  to 
an  avaat  a,  vhara  SB.lt)  ■  {  aad  aoda  (a)  la  tha  node 
at  vhloh  a  occur*.  Lot  B  ba  tha  act  of  all  event a 
oceurrlag  la  the  azaavtloa,  aad  let  D<B)  •  (z  a  D  I 
op(a)  ■  INSBBT(z)  for  torn*  a  a  B) .  B  la  partially 
ordered  by  vhloh  It  tha  laaat  reflexive  aad 
traaaitiva  ralatiaa  auoh  that: 

PI.  Bvaata  at  tha  *aaa  aad*  era  totally 
ordered ( 


P2.  If  »j  -  SEND(a)  aad  *2  -  KBCEIYE(a)  for 
the  aaa*  aaaaaga  a,  thaa  *j  —1  *j. 

Va  aov  formal laa  a  correct  via*  of  tba 
da tabaaa.  Va  represent  oar  notion  of  "knows  about” 
by  — >  i  hanoa,  vhaa  1  haa  juat  performed  avaat  a’, 
it  knova  about  aa  avaat  a  iff  a  — 1  a*.  Lot  vie*: 
B  — >  2®  ba  defined  at  foil ore:  z  a  vlav(a')  iff 

VI.  there  ia  aa  avaat  a  tueh  that  a  — b  *' 
aad  op(a)  “  INSEBT(z),  aad 

V2.  for  avary  avaat  a,  if  op(a)  »  DELETB(x), 
than  of**'. 

Va  no*  define  tha  N-node  redundant  dictionary 
problem  to  ba  tha  problem  of  fiadlag  a  dlatrlbutad 
algorithm  oa  N  node*  auoh  that  each  aoda  can 
process  tha  operations  of  INSERT,  DELETE,  LIST. 
SEND  and  RECEIVE,  subject  to  restrictions  R1  aad 
B2,  aad  sack  aoda  1  maintain*  a  correct  view  of  the 
data  V^.  That  is,  in  tha  partial  order  of  events 
corresponding  to  tha  history  of  operations  ia  the 
system.  If  a  is  aa  avaat  at  nod*  1,  than  Just  after 
tha  occurrence  of  that  avaat,  Vj  -  view(e). 

4.  Tha  Algorithm 

An  obvioma  solatioa  to  oar  dictionary  problaa 
is  the  following:  Bach  nod*  1  maintains  tvo  sets, 
Ii  aad  D|,  which  ara  tha  aata  of  elements  that  node 
1  knows  have  been  Inserted  and  deleted, 
raapaetivaly.  i's  via*  of  tha  dictionary  is  Vj  « 
*1  ~  °1*  To  Implamaat  SEND  (a) ,  aoda  J  sands  a 
aaaaaga  a  containing  Ij  aad  Dj .  Vhaa  a  aoda  1  does 
a  BBCSIVB(a),  it  updata*  its  ova  sets  aiaply  by 
taking  union*. 

Tha  drawback  to  this  aolutioa  la  that  the  sat 
It  D  Dt  contains  avary  alamaat  that  has  ever  bean 
la  i’a  via*,  and  this  sat  grow*  without  bouad,  even 
if  tha  aiz*  of  the  via*  is  itself  bounded.  Our 
algoritha  gat  a  by  vith  keeping  oaly  the  currant 
viav,  Vj,  aad  a  sasll  amount  of  additional 
information  which  vill  ba  described  shortly. 

Claarly,  it  won't  do  to  update  V^  by  raplsoiag 
it  with  Vj  V  Vj.  for  there  caa  b*  tvo  reasons  why 
aa  alamaat  z  a  Vj  U  Vj  might  b*  misalag  froa  oa*  of 
the  sate  Vk,  k  a  li, J) t 

1.  z  uaad  to  b*  ia  Vk  but  it  ha*  sine*  bean 
delated,  or 

2.  z  was  inserted  so  recently  that  aoda  k 
haa  aot  yat  heard  about  it. 

Ia  aaa*  2.  z  belong*  la  Vj  (and  la  Vj,  too),  and  ia 
css*  1,  it  should  b*  ia  neither. 


la  order  to  bi  ablo  to  distinguish  these  two 
oaaaa.  aash  aoda  maintains  the  following 
information  la  addltloa  to  Ita  currant  view  of  tba 
databaaa: 

1.  Each  aoda  1  bat  a  tall  "olockj".  Bach 
reference  to  clockj  ratoraa  a  positive 
number  that  la  larger  tbaa  all  previoas 
Talmaa  rataraad.  (Clockj  oaa  ba 

implemented  by  a  physical  cloak  or  by  a 
coaatar  that  fata  laoraaaatad  oa  aaob 
rafaraaea.  fa  talk  aboat  tba  valuta  of 
clockj  at  ha  lag  "tines”,  bat  tbay  aaad 
baar  no  ralatloa  to  raal  tlaa  nor  to  tba 
values  of  clockj  for  any  j  +  1. ) 

2.  Each  z  la  tba  ▼ law  la  taggad  with  a  pair 
iet»x,  Tz>,  where  cre^,  the  "creator”  of 

x,  la  tba  aoda  at  which  x  waa  originally 
laaartad.  and  Tz  la  the  tine,  according 
to  the  clook  of  erez»  at  which  the 
laaertlon  took  place. 

3.  Bach  node  1  maintain,  a  table  tj.  tj<J) 
la  i'a  ooatlaa  time  for  iacertioae  whioh 
took  place  at  node  j. 

The  poating  tlaa a  tall  how  carraat  i'a  kmowladgt  la 
aboat  laaartloaa  that  hawa  oocarrad  at  tha  othar 
aodaa:  1  knowa  about  aa  iaaartioa  at  aoda  J  iff 

tha  tlaa  at  which  that  operation  ocourrad 
(according  to  olockj)  la  4  tj(J). 

Given  a  view  V,  a  poet  lag  tlaa  vaotor  t.  and 
aa  element  x.  we  define  a  predicate: 

dal(V.t.x)  Iff  lx  d  V  and  Tz  I  t(crez>). 

It  will  follow  that  dal(T1.  t},  x)  holde  iff  aoda  1 
knowa  that  x  haa  baaa  delated,  fa  now  daaeriba  how 
aoda  1  procaaaaa  each  of  tha  kinda  of  operatioaa. 

Algor! tha 


BBCBIVE(a) :  Lot  a  ba  tha  aaaaaga  <7,  7>; 

Vt  (  x  a  (Vt  0  7)  I 

-dal(Vj.tj.x)  and  -del (7.7.x)  ); 
tj(k)  mail  tj(k),  7(k)  )  for  all 
k  a  IN]. 

Initially.  tj(j)  "  clockj  -  0  and  Vj  -  f  for 
all  1,  j. 

3.  Proof  of  Corrector aa 

Before  atatlng  and  proving  tha  correctnaaa  of 
thia  algorithm,  wa  need  aoaa  aora  notation.  For 
each  event  a  e  B,  lot  Via]  (raapectlvely  tie])  ba 
tha  valne  of  Vftode(e)  (raapectlvely  taode(#)) 
immediately  after  completing  a.  Let  inazta]  be  the 
predicate  Tz  i  t[e](crez).  Let  delzIa]  “  deKVIe], 
tie),  x).  Note  that  delz(e]  iff  x  i  Via]  and 
lnaz!e).  fa  will  ehow  that  Vie]  oorreapond*  to  tha 
currant  view.  inez(a]  Beans  that  x  ia  known  to  have 
been  inaartad  and  denial  aaaaa  that  x  ia  known  to 
have  baaa  delated. 

Let  a  1+ »'  iff  a  -i  e\  a  *  s’,  and  for  all 
a",  if  a  -4  a"  —>  a',  than  a”  -  a  or  a”  ■  o' .  If 
a  1+e*.  wa  any  that  a  it  an  immediate  predecessor 
of  a*  and  a*  ia  aa  eucoetsor  of  a. 

Lena.  1:  If  a  — »  a',  than  tla](i)  4 

tta'Kl). 

Thus,  porting  tines  are  monotone  over 

Proof:  Obvlome  by  Inspection  of  the  algorithm 
and  tha  conditions  oa  olockj.  0 

Loana  2:  If  x  a  Vla'I,  than  there  exists  a  s 
B  auoh  that  op(e)  “  INSEXT(x)  and  e  — >  a'. 


INSEBT(x) :  tj(l>  :-  olockj; 

oraz  :■  Is 
Tz  tj(i); 

Vj  Vj  0  lx). 

DELEIB(x):  Vj  Vj  -  (x). 

LIST:  Batura  Vj. 


SBND(m):  Send  the  aaaaaga  a  ■  <Vj.  tj>. 


Proof:  This  follows  by  aa  easy  laductioa  oa 
**,  using  tha  fact  that  initially  all  fj  •  f.  0 

Leans  3:  Let  x  a  D(B).  a’  a  E.  Than  iaazIa’] 
iff  there  axiata  a  t  K  such  that  op(e)  ■  INSBKT(x) 
and  a  — >  a* . 

Proof:  «>:  Aaaaaa  inszte’]  aad  x  a  D(E>. 
First.  tla')(craz)  2  Tz  >  0.  Lot  a"  ba  aiaiaal  ia 
B  such  that  a"  — >  a'  aad  t[e"](orez)  » 
tta'](eraz).  Inspection  of  tha  eods  shows  that 
nods (a*)  ■  oraz  sad  op(e”)  “  INSEBT(y)  for  aoaa  y  a 
b,  for  ia  every  othar  oaaa,  at  least  oaa  iiMadlata 
predeeessor  f  of  a"  hat  t[f](orez)  •  tie") (craz) , 
contrary  to  tha  alniaality  of  a".  Siaes  x  a  D(E), 
there  axiata  a  a  B  with  op(a)  _  INSBKT(x)  aad 
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node(e)  •  er«z.  By  condition  PI,  either  e  — 4  e" 
oi  •"  —4  •.  If  s'*  — 4  e,  we  hove  Tx  i  tle'](crex) 
-  tIe"l(orex)  <  tlel(crex)  “  Tx  (since  op(e)  * 
INSEBT(i)),  •  contradiction.  Sense,  e  -4  e* 
•4  e'« 

<»:  lamed late  by  the  code  for  INSERT  end 

Learns  1 .  0 

Loans  4:  If  e"  —4  e'  end  delate”!,  then 
delate'] . 

Proof:  It  suffices  to  show  thet  if  e"  4t' 
end  delate"],  then  delx(e'].  Since  delxle"l,  then 
z  i  Vie"]  end  insxte"].  By  inspection  of  the  oode 
end  restriction  Kl,  x  i  Vie’],  By  Lenas  1, 
insx(e'].  so  delxle‘]  holds.  Q 

heme*  5:  Let  x  x  D(E>,  e*  e  E.  Then  deles') 
iff  there  exists  e"  s  E  snch  thst  op(e")  • 
DELETE(x)  end  e"  —4  s' . 

Proof:  “>:  Assume  dolxleM  holds.  Let  e"  s  E 
be  minimal  such  thst  delxIe")  end  e"  — 4  s'.  Then 
x  i  Vie”]  end  insxIe‘'J,  so  by  Lenas  3,  there  exists 
e  s  B  snch  thst  op(e)  -  INSERT(x)  end  e  —4  e". 

Let  f  be  such  thst  e  —4  f  ^-4  s"  (possible  sines  e  4 
e") .  -delx(f]  by  minimality  of  e“ .  sad  iasx(fl  by 
Lesns  3;  hence,  x  s  V[f],  Sines  x  i  Vie"),  then 
op(e")  is  DELETE(x)  or  RECRTVEU)  for  sons  a. 
Bovever.  if  opts")  -  RECEIVE  (a) .  then  x  e  Vie**)  by 
the  code  for  BBCEIVE  (sinoe  ~delx  holds  for  sll 
predecessors  of  e") ,  s  eontrsdiction.  Ve  conolade 
thst  op(e")  «  OELETE(x). 

<■•:  Assume  opts")  “  DELEIE(x)  end  e"  — 4  s', 
x  i  Vie"]  by  the  oode  for  BELEIE(x).  By 
restriction  R2,  there  is  sa  immediate  predeoessor  f 
of  e"  such  thst  x  s  viev(f).  By  condition  VI, 
there  is  sn  e  s  B  snch  thst  op(e)  -  INSEKT(x)  and  e 
—4  f.  Thus,  insxle"l  by  Leans  3,  so  delxle“).  By 
Leans  4,  deles’).  0 

Ve  now  show  the  eorrectness  of  our  algorithm: 

Theorem:  For  all  s'  e  E,  vi ew(e')  »  Vie'). 

Proof:  Suppose  x  e  vlew(e').  By  oondition  VI, 
there  exists  e  —4  o’  snob  that  op(e)  “  DiSEKT(x) . 
By  Lenas  3,  lasts'].  By  eoaditioa  V2,  for  every 
e"  with  op(e")  «  DCLETE(i),  then  e"  f4e*.  Bonos, 
ve  can  apply  Leans  3  to  eonelnde  ~delxIe’],  to  x  a 
Vie* I . 


Nov  suppose  x  s  Vie'].  By  Leans  2,  e  — 4  s' 
sad  op(e)  »  INSEBT(x)  for  some  s  s  E.  Bence, 
oondition  VI  holds  for  s'.  Also,  intis']  holds  by 
Unas  3.  Let  op(e”)  -  DELETE ( x ) .  Since  -deals'], 
ve  conolade  from  Lemma  3  thst  e"  He*.  Thst. 
oondition  VJ  holds  for  e‘,  so  x  s  viewin' ). 

Ve  conclude  thst  viewin')  -  Vie’].  0 

4.  Remarks  and  Open  Problems 

Ve  have  not  yet  addressed  the  problem  of 
finding  a  good  strategy  for  the  nodes  to  use  in 
deciding  when  and  how  to  coaraaicate. 

If  each  aettsge  css  be  received  by  only  a 
single  process,  then  various  strategies  can  be 
imagined.  At  one  extreme,  s  message  transmission 
from  i  to  j  coold  be  attempted  periodically  for  sll 
pairs  1.  j.  id  j,  resulting  in  a  total  of  O(N^) 
messages  to  propagate  information  between  sll  pairs 
of  nodes.  On  tha  other  hand,  given  a  spanning  tree 
in  the  network  and  a  root,  one  can  propagate 
information  from  every  node  to  every  other  using 
only  OIN)  messages  by  first  sending  a  wave  of 
massages  up  rom  the  leaves  to  the  root  and  then 
back  down  from  the  root  to  tha  leaves.  Bovever. 
recovering  from  a  network  or  node  failure  requires 
a  special  reoovery  procedure  since  the  spanning 
tree  must  be  rebuilt.  Ve  leave  as  an  open  problem 
to  find  a  robust  0(N)-meseage  algorithm  for 
propagating  data  throughout  the  system. 

If  a  broadcast  facility  is  available,  then 
things  are  much  stabler,  for  each  node  need  only 
broadcast  a  single  message.  There  is  still  the 
problem,  however,  of  how  often  to  do  ao.  It  is 
clearly  not  sufficient  for  a  node  to  broadcast  only 
when  it  has  new  information,  for  a  node  restarting 
after  a  failure  must  have  some  means  for  being 
brought  up-to-date.  Of  eonrse,  various  protocols 
can  be  imagined  to  handle  such  situations,  and  ve 
leave  that  also  as  an  open  problem. 
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